Improving SMT quality with morpho-syntactic analysis
نویسندگان
چکیده
In the framework of statistical machine translation (SMT), correspondences between the words in the source and the target language are learned from bilingual corpora on the basis of so-called alignment models. Many of the statistical systems use little or no linguistic knowledge to structure the underlying models. In this paper we argue that training data is typically not large enough to suÆciently represent the range of di erent phenomena in natural languages and that SMT can take advantage of the explicit introduction of some knowledge about the languages under consideration. The improvement of the translation results is demonstrated on two di erent German-English corpora.
منابع مشابه
Improving Phrase-Based SMT with Morpho-Syntactic Analysis and Transformation
This paper presents our study of exploiting morpho-syntactic information for phrase-based statistical machine translation (SMT). For morphological transformation, we use hand-crafted transformational rules. For syntactic transformation, we propose a transformational model based on Bayes’ formula. The model is trained using a bilingual corpus and a broad coverage parser of the source language. T...
متن کاملFactor templates for factored machine translation models
In this paper, we present a method of avoiding the combinatorial explosion encountered in Factored Models during the construction of translation options caused by the large number of possible combinations of target language lemmas and morpho-syntactic factors. We automatically extract factor templates from a word-aligned annotated bilingual corpus and use them to distinguish which morpho-syntac...
متن کاملMorphology In Statistical Machine Translation From English To Highly Inflectional Language
In this paper, we investigate the role of morphology in phrase-based statistical machine translation (SMT) from English to the highly inflectional Slovenian language. Translation to an inflectional language is a challenging task because of its morphological complexity. Rich morphology increases data sparsity and worsens the quality of statistical machine translation. The idea of the paper is to...
متن کاملBoosting Statistical Machine Translation by Lemmatization and Linear Interpolation
Data sparseness is one of the factors that degrade statistical machine translation (SMT). Existing work has shown that using morphosyntactic information is an effective solution to data sparseness. However, fewer efforts have been made for Chinese-to-English SMT with using English morpho-syntactic analysis. We found that while English is a language with less inflection, using English lemmas in ...
متن کاملBridging Morpho-Syntactic Gap between Source and Target Sentences for English-Korean Statistical Machine Translation
Often, Statistical Machine Translation (SMT) between English and Korean suffers from null alignment. Previous studies have attempted to resolve this problem by removing unnecessary function words, or by reordering source sentences. However, the removal of function words can cause a serious loss in information. In this paper, we present a possible method of bridging the morpho-syntactic gap for ...
متن کامل